Pesquisa | Portal Regional da BVS

1.

Structural and genetic diversity in the secreted mucins, MUC5AC and MUC5B.

Plender, Elizabeth G; Prodanov, Timofey; Hsieh, PingHsun; Nizamis, Evangelos; Harvey, William T; Sulovari, Arvis; Munson, Katherine M; Kaufman, Eli J; O'Neal, Wanda K; Valdmanis, Paul N; Marschall, Tobias; Bloom, Jesse D; Eichler, Evan E.

bioRxiv ; 2024 Mar 20.

Artigo em Inglês | MEDLINE | ID: mdl-38562829

RESUMO

The secreted mucins MUC5AC and MUC5B play critical defensive roles in airway pathogen entrapment and mucociliary clearance by encoding large glycoproteins with variable number tandem repeats (VNTRs). These polymorphic and degenerate protein coding VNTRs make the loci difficult to investigate with short reads. We characterize the structural diversity of MUC5AC and MUC5B by long-read sequencing and assembly of 206 human and 20 nonhuman primate (NHP) haplotypes. We find that human MUC5B is largely invariant (5761-5762aa); however, seven haplotypes have expanded VNTRs (6291-7019aa). In contrast, 30 allelic variants of MUC5AC encode 16 distinct proteins (5249-6325aa) with cysteine-rich domain and VNTR copy number variation. We grouped MUC5AC alleles into three phylogenetic clades: H1 (46%, ~5654aa), H2 (33%, ~5742aa), and H3 (7%, ~6325aa). The two most common human MUC5AC variants are smaller than NHP gene models, suggesting a reduction in protein length during recent human evolution. Linkage disequilibrium (LD) and Tajima's D analyses reveal that East Asians carry exceptionally large MUC5AC LD blocks with an excess of rare variation (p<0.05). To validate this result, we used Locityper for genotyping MUC5AC haplogroups in 2,600 unrelated samples from the 1000 Genomes Project. We observed signatures of positive selection in H1 and H2 among East Asians and a depletion of the likely ancestral haplogroup (H3). In Africans and Europeans, H3 alleles show an excess of common variation and deviate from Hardy-Weinberg equilibrium, consistent with heterozygote advantage and balancing selection. This study provides a generalizable strategy to characterize complex protein coding VNTRs for improved disease associations.

2.

Structurally divergent and recurrently mutated regions of primate genomes.

Mao, Yafei; Harvey, William T; Porubsky, David; Munson, Katherine M; Hoekzema, Kendra; Lewis, Alexandra P; Audano, Peter A; Rozanski, Allison; Yang, Xiangyu; Zhang, Shilong; Yoo, DongAhn; Gordon, David S; Fair, Tyler; Wei, Xiaoxi; Logsdon, Glennis A; Haukness, Marina; Dishuck, Philip C; Jeong, Hyeonsoo; Del Rosario, Ricardo; Bauer, Vanessa L; Fattor, Will T; Wilkerson, Gregory K; Mao, Yuxiang; Shi, Yongyong; Sun, Qiang; Lu, Qing; Paten, Benedict; Bakken, Trygve E; Pollen, Alex A; Feng, Guoping; Sawyer, Sara L; Warren, Wesley C; Carbone, Lucia; Eichler, Evan E.

Cell ; 187(6): 1547-1562.e13, 2024 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-38428424

RESUMO

We sequenced and assembled using multiple long-read sequencing technologies the genomes of chimpanzee, bonobo, gorilla, orangutan, gibbon, macaque, owl monkey, and marmoset. We identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. We estimate that 819.47 Mbp or â¼27% of the genome has been affected by SVs across primate evolution. We identify 1,607 structurally divergent regions wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (e.g., CARD, C4, and OLAH gene families) and additional lineage-specific genes are generated (e.g., CKAP2, VPS36, ACBD7, and NEK5 paralogs), becoming targets of rapid chromosomal diversification and positive selection (e.g., RGPD gene family). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species.

Assuntos

Genoma , Primatas , Animais , Humanos , Sequência de Bases , Primatas/classificação , Primatas/genética , Evolução Biológica , Análise de Sequência de DNA , Variação Estrutural do Genoma

3.

Complete chromosome 21 centromere sequences from a Down syndrome family reveal size asymmetry and differences in kinetochore attachment.

Mastrorosa, F Kumara; Rozanski, Allison N; Harvey, William T; Knuth, Jordan; Garcia, Gage; Munson, Katherine M; Hoekzema, Kendra; Logsdon, Glennis A; Eichler, Evan E.

bioRxiv ; 2024 Feb 26.

Artigo em Inglês | MEDLINE | ID: mdl-38464314

RESUMO

Down syndrome is the most common form of human intellectual disability caused by precocious segregation and nondisjunction of chromosome 21. Differences in centromere structure have been hypothesized to play a potential role in this process in addition to the well-established risk of advancing maternal age. Using long-read sequencing, we completely sequenced and assembled the centromeres from a parent-child trio where Trisomy 21 arose in the child as a result of a meiosis I error. The proband carries three distinct chromosome 21 centromere haplotypes that vary by 11-fold in length--both the largest (H1) and smallest (H2) originating from the mother. The longest H1 allele harbors a less clearly defined centromere dip region (CDR) as defined by CpG methylation and a significantly reduced signal by CENP-A chromatin immunoprecipitation sequencing when compared to H2 or paternal H3 centromeres. These epigenetic signatures suggest less competent kinetochore attachment for the maternally transmitted H1. Analysis of H1 in the mother indicates that the reduced CENP-A ChIP-seq signal, but not the CDR profile, pre-existed the meiotic nondisjunction event. A comparison of the three proband centromeres to a population sampling of 35 completely sequenced chromosome 21 centromeres shows that H2 is the smallest centromere sequenced to date and all three haplotypes (H1-H3) share a common origin of ~15 thousand years ago. These results suggest that recent asymmetry in size and epigenetic differences of chromosome 21 centromeres may contribute to nondisjunction risk.

4.

The Complete Sequence and Comparative Analysis of Ape Sex Chromosomes.

Makova, Kateryna D; Pickett, Brandon D; Harris, Robert S; Hartley, Gabrielle A; Cechova, Monika; Pal, Karol; Nurk, Sergey; Yoo, DongAhn; Li, Qiuhui; Hebbar, Prajna; McGrath, Barbara C; Antonacci, Francesca; Aubel, Margaux; Biddanda, Arjun; Borchers, Matthew; Bomberg, Erich; Bouffard, Gerard G; Brooks, Shelise Y; Carbone, Lucia; Carrel, Laura; Carroll, Andrew; Chang, Pi-Chuan; Chin, Chen-Shan; Cook, Daniel E; Craig, Sarah J C; de Gennaro, Luciana; Diekhans, Mark; Dutra, Amalia; Garcia, Gage H; Grady, Patrick G S; Green, Richard E; Haddad, Diana; Hallast, Pille; Harvey, William T; Hickey, Glenn; Hillis, David A; Hoyt, Savannah J; Jeong, Hyeonsoo; Kamali, Kaivan; Kosakovsky Pond, Sergei L; LaPolice, Troy M; Lee, Charles; Lewis, Alexandra P; Loh, Yong-Hwee E; Masterson, Patrick; McCoy, Rajiv C; Medvedev, Paul; Miga, Karen H; Munson, Katherine M; Pak, Evgenia.

bioRxiv ; 2023 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-38077089

RESUMO

Apes possess two sex chromosomes-the male-specific Y and the X shared by males and females. The Y chromosome is crucial for male reproduction, with deletions linked to infertility. The X chromosome carries genes vital for reproduction and cognition. Variation in mating patterns and brain function among great apes suggests corresponding differences in their sex chromosome structure and evolution. However, due to their highly repetitive nature and incomplete reference assemblies, ape sex chromosomes have been challenging to study. Here, using the state-of-the-art experimental and computational methods developed for the telomere-to-telomere (T2T) human genome, we produced gapless, complete assemblies of the X and Y chromosomes for five great apes (chimpanzee, bonobo, gorilla, Bornean and Sumatran orangutans) and a lesser ape, the siamang gibbon. These assemblies completely resolved ampliconic, palindromic, and satellite sequences, including the entire centromeres, allowing us to untangle the intricacies of ape sex chromosome evolution. We found that, compared to the X, ape Y chromosomes vary greatly in size and have low alignability and high levels of structural rearrangements. This divergence on the Y arises from the accumulation of lineage-specific ampliconic regions and palindromes (which are shared more broadly among species on the X) and from the abundance of transposable elements and satellites (which have a lower representation on the X). Our analysis of Y chromosome genes revealed lineage-specific expansions of multi-copy gene families and signatures of purifying selection. In summary, the Y exhibits dynamic evolution, while the X is more stable. Finally, mapping short-read sequencing data from >100 great ape individuals revealed the patterns of diversity and selection on their sex chromosomes, demonstrating the utility of these reference assemblies for studies of great ape evolution. These complete sex chromosome assemblies are expected to further inform conservation genetics of nonhuman apes, all of which are endangered species.

5.

Common lizard microhabitat selection varies by sex, parity mode, and colouration.

Recknagel, Hans; Harvey, William T; Layton, Megan; Elmer, Kathryn R.

BMC Ecol Evol ; 23(1): 47, 2023 09 04.

Artigo em Inglês | MEDLINE | ID: mdl-37667183

RESUMO

BACKGROUND: Animals select and interact with their environment in various ways, including to ensure their physiology is at its optimal capacity, access to prey is possible, and predators can be avoided. Often conflicting, the balance of choices made may vary depending on an individual's life-history and condition. The common lizard (Zootoca vivipara) has egg-laying and live-bearing lineages and displays a variety of dorsal patterns and colouration. How colouration and reproductive mode affect habitat selection decisions on the landscape is not known. In this study, we first tested if co-occurring male and female viviparous and oviparous common lizards differ in their microhabitat selection. Second, we tested if the dorsal colouration of an individual lizard matched its basking site choice within the microhabitat where it was encountered, which could be related to camouflage and crypsis. RESULTS: We found that site use differed from the habitat otherwise available, suggesting lizards actively choose the composition and structure of their microhabitat. Females were found in areas with more wood and less bare ground compared to males; we speculate that this may be for better camouflage and reducing predation risk during pregnancy, when females are less mobile. Microhabitat use also differed by parity mode: viviparous lizards were found in areas with more density of flowering plants, while oviparous lizards were found in areas that were wetter and had more moss. This may relate to differing habitat preferences of viviparous vs. oviparous for clutch lay sites. We found that an individual's dorsal colouration matched that of the substrate of its basking site. This could indicate that individuals may choose their basking site to optimise camouflage within microhabitat. Further, all individuals were found basking in areas close to cover, which we expect could be used to escape predation. CONCLUSIONS: Our study suggests that common lizards may actively choose their microhabitat and basking site, balancing physiological requirements, escape response and camouflage as a tactic for predator avoidance. This varies for parity modes, sexes, and dorsal colourations, suggesting that individual optimisation strategies are influenced by inter-individual variation within populations as well as determined by evolutionary differences associated with life history.

Assuntos

Lagartos , Reprodução , Animais , Feminino , Masculino , Evolução Biológica , Ecossistema , Pigmentação

6.

Assembly of 43 human Y chromosomes reveals extensive complexity and variation.

Hallast, Pille; Ebert, Peter; Loftus, Mark; Yilmaz, Feyza; Audano, Peter A; Logsdon, Glennis A; Bonder, Marc Jan; Zhou, Weichen; Höps, Wolfram; Kim, Kwondo; Li, Chong; Hoyt, Savannah J; Dishuck, Philip C; Porubsky, David; Tsetsos, Fotios; Kwon, Jee Young; Zhu, Qihui; Munson, Katherine M; Hasenfeld, Patrick; Harvey, William T; Lewis, Alexandra P; Kordosky, Jennifer; Hoekzema, Kendra; O'Neill, Rachel J; Korbel, Jan O; Tyler-Smith, Chris; Eichler, Evan E; Shi, Xinghua; Beck, Christine R; Marschall, Tobias; Konkel, Miriam K; Lee, Charles.

Nature ; 621(7978): 355-364, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37612510

RESUMO

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.

Assuntos

Cromossomos Humanos Y , Evolução Molecular , Humanos , Masculino , Cromossomos Humanos Y/genética , Genoma Humano/genética , Genômica , Taxa de Mutação , Fenótipo , Eucromatina/genética , Pseudogenes , Variação Genética/genética , Cromossomos Humanos X/genética , Regiões Pseudoautossômicas/genética

7.

The complete sequence of a human Y chromosome.

Rhie, Arang; Nurk, Sergey; Cechova, Monika; Hoyt, Savannah J; Taylor, Dylan J; Altemose, Nicolas; Hook, Paul W; Koren, Sergey; Rautiainen, Mikko; Alexandrov, Ivan A; Allen, Jamie; Asri, Mobin; Bzikadze, Andrey V; Chen, Nae-Chyun; Chin, Chen-Shan; Diekhans, Mark; Flicek, Paul; Formenti, Giulio; Fungtammasan, Arkarachai; Garcia Giron, Carlos; Garrison, Erik; Gershman, Ariel; Gerton, Jennifer L; Grady, Patrick G S; Guarracino, Andrea; Haggerty, Leanne; Halabian, Reza; Hansen, Nancy F; Harris, Robert; Hartley, Gabrielle A; Harvey, William T; Haukness, Marina; Heinz, Jakob; Hourlier, Thibaut; Hubley, Robert M; Hunt, Sarah E; Hwang, Stephen; Jain, Miten; Kesharwani, Rupesh K; Lewis, Alexandra P; Li, Heng; Logsdon, Glennis A; Lucas, Julian K; Makalowski, Wojciech; Markovic, Christopher; Martin, Fergal J; Mc Cartney, Ann M; McCoy, Rajiv C; McDaniel, Jennifer; McNulty, Brandy M.

Nature ; 621(7978): 344-354, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37612512

RESUMO

The human Y chromosome has been notoriously difficult to sequence and assemble because of its complex repeat structure that includes long palindromes, tandem repeats and segmental duplications1-3. As a result, more than half of the Y chromosome is missing from the GRCh38 reference sequence and it remains the last human chromosome to be finished4,5. Here, the Telomere-to-Telomere (T2T) consortium presents the complete 62,460,029-base-pair sequence of a human Y chromosome from the HG002 genome (T2T-Y) that corrects multiple errors in GRCh38-Y and adds over 30 million base pairs of sequence to the reference, showing the complete ampliconic structures of gene families TSPY, DAZ and RBMY; 41 additional protein-coding genes, mostly from the TSPY family; and an alternating pattern of human satellite 1 and 3 blocks in the heterochromatic Yq12 region. We have combined T2T-Y with a previous assembly of the CHM13 genome4 and mapped available population variation, clinical variants and functional genomics data to produce a complete and comprehensive reference sequence for all 24 human chromosomes.

Assuntos

Cromossomos Humanos Y , Genômica , Análise de Sequência de DNA , Humanos , Sequência de Bases , Cromossomos Humanos Y/genética , DNA Satélite/genética , Variação Genética/genética , Genética Populacional , Genômica/métodos , Genômica/normas , Heterocromatina/genética , Família Multigênica/genética , Padrões de Referência , Duplicações Segmentares Genômicas/genética , Análise de Sequência de DNA/normas , Sequências de Repetição em Tandem/genética , Telômero/genética

8.

The SARS-CoV-2 Spike Protein Mutation Explorer: using an interactive application to improve the public understanding of SARS-CoV-2 variants of concern.

Iannucci, Sarah; Harvey, William T; Hughes, Joseph; Robertson, David L; Poyade, Matthieu; Hutchinson, Edward.

J Vis Commun Med ; 46(3): 122-132, 2023 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-37526402

RESUMO

Due to the COVID-19 pandemic the virus responsible, SARS-CoV-2, became a source of intense interest for non-expert audiences. The viral spike protein gained particular public interest as the main target for protective immune responses, including those elicited by vaccines. The rapid evolution of SARS-CoV-2 resulted in variations in the spike that enhanced transmissibility or weakened vaccine protection. This created new variants of concern (VOCs). The emergence of VOCs was studied using viral sequence data which was shared through portals such as the online Mutation Explorer of the COVID-19 Genomics UK consortium (COG-UK/ME). This was designed for an expert audience, but the information it contained could be of general interest if suitably communicated. Visualisations, interactivity and animation can improve engagement and understanding of molecular biology topics, and so we developed a graphical educational resource, the SARS-CoV-2 Spike Protein Mutation Explorer (SSPME), which used interactive 3D molecular models and animations to explain the molecular biology underpinning VOCs. User testing showed that the SSPME had better usability and improved participant knowledge confidence and knowledge acquisition compared to COG-UK/ME. This demonstrates how interactive visualisations can be used for effective molecular biology communication, as well as improving the public understanding of SARS-CoV-2 VOCs.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Glicoproteína da Espícula de Coronavírus/genética , Pandemias , Mutação

9.

Characterization of large-scale genomic differences in the first complete human genome.

Yang, Xiangyu; Wang, Xuankai; Zou, Yawen; Zhang, Shilong; Xia, Manying; Fu, Lianting; Vollger, Mitchell R; Chen, Nae-Chyun; Taylor, Dylan J; Harvey, William T; Logsdon, Glennis A; Meng, Dan; Shi, Junfeng; McCoy, Rajiv C; Schatz, Michael C; Li, Weidong; Eichler, Evan E; Lu, Qing; Mao, Yafei.

Genome Biol ; 24(1): 157, 2023 07 04.

Artigo em Inglês | MEDLINE | ID: mdl-37403156

RESUMO

BACKGROUND: The first telomere-to-telomere (T2T) human genome assembly (T2T-CHM13) release is a milestone in human genomics. The T2T-CHM13 genome assembly extends our understanding of telomeres, centromeres, segmental duplication, and other complex regions. The current human genome reference (GRCh38) has been widely used in various human genomic studies. However, the large-scale genomic differences between these two important genome assemblies are not characterized in detail yet. RESULTS: Here, in addition to the previously reported "non-syntenic" regions, we find 67 additional large-scale discrepant regions and precisely categorize them into four structural types with a newly developed website tool called SynPlotter. The discrepant regions (~ 21.6 Mbp) excluding telomeric and centromeric regions are highly structurally polymorphic in humans, where the deletions or duplications are likely associated with various human diseases, such as immune and neurodevelopmental disorders. The analyses of a newly identified discrepant region-the KLRC gene cluster-show that the depletion of KLRC2 by a single-deletion event is associated with natural killer cell differentiation in ~ 20% of humans. Meanwhile, the rapid amino acid replacements observed within KLRC3 are probably a result of natural selection in primate evolution. CONCLUSION: Our study provides a foundation for understanding the large-scale structural genomic differences between the two crucial human reference genomes, and is thereby important for future human genomics studies.

Assuntos

Genoma Humano , Genômica , Animais , Humanos , Duplicações Segmentares Genômicas , Família Multigênica , Centrômero/genética , Subfamília C de Receptores Semelhantes a Lectina de Células NK/genética

10.

Gaps and complex structurally variant loci in phased genome assemblies.

Porubsky, David; Vollger, Mitchell R; Harvey, William T; Rozanski, Allison N; Ebert, Peter; Hickey, Glenn; Hasenfeld, Patrick; Sanders, Ashley D; Stober, Catherine; Korbel, Jan O; Paten, Benedict; Marschall, Tobias; Eichler, Evan E.

Genome Res ; 33(4): 496-510, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-37164484

RESUMO

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.

Assuntos

DNA Satélite , Polimorfismo Genético , Humanos , DNA Satélite/genética , Haplótipos , Duplicações Segmentares Genômicas , Análise de Sequência de DNA

11.

Increased mutation and gene conversion within human segmental duplications.

Vollger, Mitchell R; Dishuck, Philip C; Harvey, William T; DeWitt, William S; Guitart, Xavi; Goldberg, Michael E; Rozanski, Allison N; Lucas, Julian; Asri, Mobin; Munson, Katherine M; Lewis, Alexandra P; Hoekzema, Kendra; Logsdon, Glennis A; Porubsky, David; Paten, Benedict; Harris, Kelley; Hsieh, PingHsun; Eichler, Evan E.

Nature ; 617(7960): 325-334, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37165237

RESUMO

Single-nucleotide variants (SNVs) in segmental duplications (SDs) have not been systematically assessed because of the limitations of mapping short-read sequencing data1,2. Here we constructed 1:1 unambiguous alignments spanning high-identity SDs across 102 human haplotypes and compared the pattern of SNVs between unique and duplicated regions3,4. We find that human SNVs are elevated 60% in SDs compared to unique regions and estimate that at least 23% of this increase is due to interlocus gene conversion (IGC) with up to 4.3 megabase pairs of SD sequence converted on average per human haplotype. We develop a genome-wide map of IGC donors and acceptors, including 498 acceptor and 454 donor hotspots affecting the exons of about 800 protein-coding genes. These include 171 genes that have 'relocated' on average 1.61 megabase pairs in a subset of human haplotypes. Using a coalescent framework, we show that SD regions are slightly evolutionarily older when compared to unique sequences, probably owing to IGC. SNVs in SDs, however, show a distinct mutational spectrum: a 27.1% increase in transversions that convert cytosine to guanine or the reverse across all triplet contexts and a 7.6% reduction in the frequency of CpG-associated mutations when compared to unique DNA. We reason that these distinct mutational properties help to maintain an overall higher GC content of SD DNA compared to that of unique DNA, probably driven by GC-biased conversion between paralogous sequences5,6.

Assuntos

Conversão Gênica , Mutação , Duplicações Segmentares Genômicas , Humanos , Conversão Gênica/genética , Genoma Humano/genética , Polimorfismo de Nucleotídeo Único/genética , Haplótipos/genética , Éxons/genética , Citosina/química , Guanina/química , Ilhas de CpG/genética

12.

A draft human pangenome reference.

Liao, Wen-Wei; Asri, Mobin; Ebler, Jana; Doerr, Daniel; Haukness, Marina; Hickey, Glenn; Lu, Shuangjia; Lucas, Julian K; Monlong, Jean; Abel, Haley J; Buonaiuto, Silvia; Chang, Xian H; Cheng, Haoyu; Chu, Justin; Colonna, Vincenza; Eizenga, Jordan M; Feng, Xiaowen; Fischer, Christian; Fulton, Robert S; Garg, Shilpa; Groza, Cristian; Guarracino, Andrea; Harvey, William T; Heumos, Simon; Howe, Kerstin; Jain, Miten; Lu, Tsung-Yu; Markello, Charles; Martin, Fergal J; Mitchell, Matthew W; Munson, Katherine M; Mwaniki, Moses Njagi; Novak, Adam M; Olsen, Hugh E; Pesout, Trevor; Porubsky, David; Prins, Pjotr; Sibbesen, Jonas A; Sirén, Jouni; Tomlinson, Chad; Villani, Flavia; Vollger, Mitchell R; Antonacci-Fulton, Lucinda L; Baid, Gunjan; Baker, Carl A; Belyaeva, Anastasiya; Billis, Konstantinos; Carroll, Andrew; Chang, Pi-Chuan; Cody, Sarah.

Nature ; 617(7960): 312-324, 2023 05.

Artigo em Inglês | MEDLINE | ID: mdl-37165242

RESUMO

Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals1. These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.

Assuntos

Genoma Humano , Genômica , Humanos , Diploide , Genoma Humano/genética , Haplótipos/genética , Análise de Sequência de DNA , Genômica/normas , Padrões de Referência , Estudos de Coortes , Alelos , Variação Genética

13.

Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall.

Harvey, William T; Ebert, Peter; Ebler, Jana; Audano, Peter A; Munson, Katherine M; Hoekzema, Kendra; Porubsky, David; Beck, Christine R; Marschall, Tobias; Garimella, Kiran; Eichler, Evan E.

bioRxiv ; 2023 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-37205567

RESUMO

Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

14.

Inversion polymorphism in a complete human genome assembly.

Porubsky, David; Harvey, William T; Rozanski, Allison N; Ebler, Jana; Höps, Wolfram; Ashraf, Hufsah; Hasenfeld, Patrick; Paten, Benedict; Sanders, Ashley D; Marschall, Tobias; Korbel, Jan O; Eichler, Evan E.

Genome Biol ; 24(1): 100, 2023 04 30.

Artigo em Inglês | MEDLINE | ID: mdl-37122002

RESUMO

The telomere-to-telomere (T2T) complete human reference has significantly improved our ability to characterize genome structural variation. To understand its impact on inversion polymorphisms, we remapped data from 41 genomes against the T2T reference genome and compared it to the GRCh38 reference. We find a ~ 21% increase in sensitivity improving mapping of 63 inversions on the T2T reference. We identify 26 misorientations within GRCh38 and show that the T2T reference is three times more likely to represent the correct orientation of the major human allele. Analysis of 10 additional samples reveals novel rare inversions at chromosomes 15q25.2, 16p11.2, 16q22.1-23.1, and 22q11.21.

Assuntos

Genoma Humano , Polimorfismo Genético , Humanos , Variação Estrutural do Genoma , Inversão Cromossômica

15.

Structurally divergent and recurrently mutated regions of primate genomes.

Mao, Yafei; Harvey, William T; Porubsky, David; Munson, Katherine M; Hoekzema, Kendra; Lewis, Alexandra P; Audano, Peter A; Rozanski, Allison; Yang, Xiangyu; Zhang, Shilong; Gordon, David S; Wei, Xiaoxi; Logsdon, Glennis A; Haukness, Marina; Dishuck, Philip C; Jeong, Hyeonsoo; Del Rosario, Ricardo; Bauer, Vanessa L; Fattor, Will T; Wilkerson, Gregory K; Lu, Qing; Paten, Benedict; Feng, Guoping; Sawyer, Sara L; Warren, Wesley C; Carbone, Lucia; Eichler, Evan E.

bioRxiv ; 2023 Mar 07.

Artigo em Inglês | MEDLINE | ID: mdl-36945442

RESUMO

To better understand the pattern of primate genome structural variation, we sequenced and assembled using multiple long-read sequencing technologies the genomes of eight nonhuman primate species, including New World monkeys (owl monkey and marmoset), Old World monkey (macaque), Asian apes (orangutan and gibbon), and African ape lineages (gorilla, bonobo, and chimpanzee). Compared to the human genome, we identified 1,338,997 lineage-specific fixed structural variants (SVs) disrupting 1,561 protein-coding genes and 136,932 regulatory elements, including the most complete set of human-specific fixed differences. Across 50 million years of primate evolution, we estimate that 819.47 Mbp or ~27% of the genome has been affected by SVs based on analysis of these primate lineages. We identify 1,607 structurally divergent regions (SDRs) wherein recurrent structural variation contributes to creating SV hotspots where genes are recurrently lost (CARDs, ABCD7, OLAH) and new lineage-specific genes are generated (e.g., CKAP2, NEK5) and have become targets of rapid chromosomal diversification and positive selection (e.g., RGPDs). High-fidelity long-read sequencing has made these dynamic regions of the genome accessible for sequence-level analyses within and between primate species for the first time.

16.

A Bayesian approach to incorporate structural data into the mapping of genotype to antigenic phenotype of influenza A(H3N2) viruses.

Harvey, William T; Davies, Vinny; Daniels, Rodney S; Whittaker, Lynne; Gregory, Victoria; Hay, Alan J; Husmeier, Dirk; McCauley, John W; Reeve, Richard.

PLoS Comput Biol ; 19(3): e1010885, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36972311

RESUMO

Surface antigens of pathogens are commonly targeted by vaccine-elicited antibodies but antigenic variability, notably in RNA viruses such as influenza, HIV and SARS-CoV-2, pose challenges for control by vaccination. For example, influenza A(H3N2) entered the human population in 1968 causing a pandemic and has since been monitored, along with other seasonal influenza viruses, for the emergence of antigenic drift variants through intensive global surveillance and laboratory characterisation. Statistical models of the relationship between genetic differences among viruses and their antigenic similarity provide useful information to inform vaccine development, though accurate identification of causative mutations is complicated by highly correlated genetic signals that arise due to the evolutionary process. Here, using a sparse hierarchical Bayesian analogue of an experimentally validated model for integrating genetic and antigenic data, we identify the genetic changes in influenza A(H3N2) virus that underpin antigenic drift. We show that incorporating protein structural data into variable selection helps resolve ambiguities arising due to correlated signals, with the proportion of variables representing haemagglutinin positions decisively included, or excluded, increased from 59.8% to 72.4%. The accuracy of variable selection judged by proximity to experimentally determined antigenic sites was improved simultaneously. Structure-guided variable selection thus improves confidence in the identification of genetic explanations of antigenic variation and we also show that prioritising the identification of causative mutations is not detrimental to the predictive capability of the analysis. Indeed, incorporating structural information into variable selection resulted in a model that could more accurately predict antigenic assay titres for phenotypically-uncharacterised virus from genetic sequence. Combined, these analyses have the potential to inform choices of reference viruses, the targeting of laboratory assays, and predictions of the evolutionary success of different genotypes, and can therefore be used to inform vaccine selection processes.

Assuntos

COVID-19 , Vírus da Influenza A , Influenza Humana , Humanos , Influenza Humana/prevenção & controle , Vírus da Influenza A Subtipo H3N2/genética , Teorema de Bayes , Glicoproteínas de Hemaglutininação de Vírus da Influenza/genética , SARS-CoV-2 , Antígenos Virais/genética , Genótipo , Fenótipo , Anticorpos Antivirais/genética

17.

SARS-CoV-2 variant biology: immune escape, transmission and fitness.

Carabelli, Alessandro M; Peacock, Thomas P; Thorne, Lucy G; Harvey, William T; Hughes, Joseph; Peacock, Sharon J; Barclay, Wendy S; de Silva, Thushan I; Towers, Greg J; Robertson, David L.

Nat Rev Microbiol ; 21(3): 162-177, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36653446

RESUMO

In late 2020, after circulating for almost a year in the human population, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) exhibited a major step change in its adaptation to humans. These highly mutated forms of SARS-CoV-2 had enhanced rates of transmission relative to previous variants and were termed 'variants of concern' (VOCs). Designated Alpha, Beta, Gamma, Delta and Omicron, the VOCs emerged independently from one another, and in turn each rapidly became dominant, regionally or globally, outcompeting previous variants. The success of each VOC relative to the previously dominant variant was enabled by altered intrinsic functional properties of the virus and, to various degrees, changes to virus antigenicity conferring the ability to evade a primed immune response. The increased virus fitness associated with VOCs is the result of a complex interplay of virus biology in the context of changing human immunity due to both vaccination and prior infection. In this Review, we summarize the literature on the relative transmissibility and antigenicity of SARS-CoV-2 variants, the role of mutations at the furin spike cleavage site and of non-spike proteins, the potential importance of recombination to virus success, and SARS-CoV-2 evolution in the context of T cells, innate immunity and population immunity. SARS-CoV-2 shows a complicated relationship among virus antigenicity, transmission and virulence, which has unpredictable implications for the future trajectory and disease burden of COVID-19.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Imunidade Inata , Biologia

18.

SARS-CoV-2 variant evasion of monoclonal antibodies based on in vitro studies.

Cox, MacGregor; Peacock, Thomas P; Harvey, William T; Hughes, Joseph; Wright, Derek W; Willett, Brian J; Thomson, Emma; Gupta, Ravindra K; Peacock, Sharon J; Robertson, David L; Carabelli, Alessandro M.

Nat Rev Microbiol ; 21(2): 112-124, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36307535

RESUMO

Monoclonal antibodies (mAbs) offer a treatment option for individuals with severe COVID-19 and are especially important in high-risk individuals where vaccination is not an option. Given the importance of understanding the evolution of resistance to mAbs by SARS-CoV-2, we reviewed the available in vitro neutralization data for mAbs against live variants and viral constructs containing spike mutations of interest. Unfortunately, evasion of mAb-induced protection is being reported with new SARS-CoV-2 variants. The magnitude of neutralization reduction varied greatly among mAb-variant pairs. For example, sotrovimab retained its neutralization capacity against Omicron BA.1 but showed reduced efficacy against BA.2, BA.4 and BA.5, and BA.2.12.1. At present, only bebtelovimab has been reported to retain its efficacy against all SARS-CoV-2 variants considered here. Resistance to mAb neutralization was dominated by the action of epitope single amino acid substitutions in the spike protein. Although not all observed epitope mutations result in increased mAb evasion, amino acid substitutions at non-epitope positions and combinations of mutations also contribute to evasion of neutralization. This Review highlights the implications for the rational design of viral genomic surveillance and factors to consider for the development of novel mAb therapies.

Assuntos

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , Anticorpos Monoclonais/farmacologia , Substituição de Aminoácidos , Anticorpos Neutralizantes , Epitopos , Anticorpos Antivirais

19.

Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall.

Harvey, William T; Ebert, Peter; Ebler, Jana; Audano, Peter A; Munson, Katherine M; Hoekzema, Kendra; Porubsky, David; Beck, Christine R; Marschall, Tobias; Garimella, Kiran; Eichler, Evan E.

Genome Res ; 33(12): 2029-2040, 2023 Dec 27.

Artigo em Inglês | MEDLINE | ID: mdl-38190646

RESUMO

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

Assuntos

Genômica , Nanoporos , Mutação INDEL , Sequenciamento Completo do Genoma

20.

Publisher Correction: SARS-CoV-2 Omicron is an immune escape variant with an altered cell entry pathway.

Willett, Brian J; Grove, Joe; MacLean, Oscar A; Wilkie, Craig; De Lorenzo, Giuditta; Furnon, Wilhelm; Cantoni, Diego; Scott, Sam; Logan, Nicola; Ashraf, Shirin; Manali, Maria; Szemiel, Agnieszka; Cowton, Vanessa; Vink, Elen; Harvey, William T; Davis, Chris; Asamaphan, Patawee; Smollett, Katherine; Tong, Lily; Orton, Richard; Hughes, Joseph; Holland, Poppy; Silva, Vanessa; Pascall, David J; Puxty, Kathryn; da Silva Filipe, Ana; Yebra, Gonzalo; Shaaban, Sharif; Holden, Matthew T G; Pinto, Rute Maria; Gunson, Rory; Templeton, Kate; Murcia, Pablo R; Patel, Arvind H; Klenerman, Paul; Dunachie, Susanna; Haughney, John; Robertson, David L; Palmarini, Massimo; Ray, Surajit; Thomson, Emma C.

Nat Microbiol ; 7(10): 1709, 2022 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-36114232

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA